31 research outputs found

    Towards computationally efficient neural networks with adaptive and dynamic computations

    Full text link
    Ces dernières années, l'intelligence artificielle a été considérablement avancée et l'apprentissage en profondeur, où des réseaux de neurones profonds sont utilisés pour tenter d'imiter vaguement le cerveau humain, y a contribué de manière significative. Les réseaux de neurones profonds sont désormais capables d'obtenir un grand succès sur la base d'une grande quantité de données et de ressources de calcul suffisantes. Malgré leur succès, leur capacité à s'adapter rapidement à de nouveaux concepts, tâches et environnements est assez limitée voire inexistante. Dans cette thèse, nous nous intéressons à la façon dont les réseaux de neurones profonds peuvent s'adapter à des circonstances en constante évolution ou totalement nouvelles, de la même manière que l'intelligence humaine, et introduisons en outre des modules architecturaux adaptatifs et dynamiques ou des cadres de méta-apprentissage pour que cela se produise de manière efficace sur le plan informatique. Cette thèse consiste en une série d'études proposant des méthodes pour utiliser des calculs adaptatifs et dynamiques pour aborder les problèmes d'adaptation qui sont étudiés sous différentes perspectives telles que les adaptations au niveau de la tâche, au niveau temporel et au niveau du contexte. Dans le premier article, nous nous concentrons sur l'adaptation rapide des tâches basée sur un cadre de méta-apprentissage. Plus précisément, nous étudions l'incertitude du modèle induite par l'adaptation rapide à une nouvelle tâche avec quelques exemples. Ce problème est atténué en combinant un méta-apprentissage efficace basé sur des gradients avec une inférence variationnelle non paramétrique dans un cadre probabiliste fondé sur des principes. C'est une étape importante vers un méta-apprentissage robuste que nous développons une méthode d'apprentissage bayésienne à quelques exemples pour éviter le surapprentissage au niveau des tâches. Dans le deuxième article, nous essayons d'améliorer les performances de la prédiction de la séquence (c'est-à-dire du futur) en introduisant une prédiction du futur sauteur basée sur la taille du pas adaptatif. C'est une capacité critique pour un agent intelligent d'explorer un environnement qui permet un apprentissage efficace avec une imagination sauteur futur. Nous rendons cela possible en introduisant le modèle hiérarchique d'espace d'état récurrent (HRSSM) qui peut découvrir la structure temporelle latente (par exemple, les sous-séquences) tout en modélisant ses transitions d'état stochastiques de manière hiérarchique. Enfin, dans le dernier article, nous étudions un cadre qui peut capturer le contexte global dans les données d'image de manière adaptative et traiter davantage les données en fonction de ces informations. Nous implémentons ce cadre en extrayant des concepts visuels de haut niveau à travers des modules d'attention et en utilisant un raisonnement basé sur des graphes pour en saisir le contexte global. De plus, des transformations au niveau des caractéristiques sont utilisées pour propager le contexte global à tous les descripteurs locaux de manière adaptative.Over the past few years, artificial intelligence has been greatly advanced, and deep learning, where deep neural networks are used to attempt to loosely emulate the human brain, has significantly contributed to it. Deep neural networks are now able to achieve great success based on a large amount of data and sufficient computational resources. Despite their success, their ability to quickly adapt to new concepts, tasks, and environments is quite limited or even non-existent. In this thesis, we are interested in how deep neural networks can become adaptive to continually changing or totally new circumstances, similarly to human intelligence, and further introduce adaptive and dynamic architectural modules or meta-learning frameworks to make it happen in computationally efficient ways. This thesis consists of a series of studies proposing methods to utilize adaptive and dynamic computations to tackle adaptation problems that are investigated from different perspectives such as task-level, temporal-level, and context-level adaptations. In the first article, we focus on task-level fast adaptation based on a meta-learning framework. More specifically, we investigate the inherent model uncertainty that is induced from quickly adapting to a new task with a few examples. This problem is alleviated by combining the efficient gradient-based meta-learning with nonparametric variational inference in a principled probabilistic framework. It is an important step towards robust meta-learning that we develop a Bayesian few-shot learning method to prevent task-level overfitting. In the second article, we attempt to improve the performance of sequence (i.e. future) prediction by introducing a jumpy future prediction that is based on the adaptive step size. It is a critical ability for an intelligent agent to explore an environment that enables efficient option-learning and jumpy future imagination. We make this possible by introducing the Hierarchical Recurrent State Space Model (HRSSM) that can discover the latent temporal structure (e.g. subsequences) while also modeling its stochastic state transitions hierarchically. Finally, in the last article, we investigate a framework that can capture the global context in image data in an adaptive way and further process the data based on that information. We implement this framework by extracting high-level visual concepts through attention modules and using graph-based reasoning to capture the global context from them. In addition, feature-wise transformations are used to propagate the global context to all local descriptors in an adaptive way

    Discrete denoising of heterogenous two-dimensional data

    Full text link
    We consider discrete denoising of two-dimensional data with characteristics that may be varying abruptly between regions. Using a quadtree decomposition technique and space-filling curves, we extend the recently developed S-DUDE (Shifting Discrete Universal DEnoiser), which was tailored to one-dimensional data, to the two-dimensional case. Our scheme competes with a genie that has access, in addition to the noisy data, also to the underlying noiseless data, and can employ mm different two-dimensional sliding window denoisers along mm distinct regions obtained by a quadtree decomposition with mm leaves, in a way that minimizes the overall loss. We show that, regardless of what the underlying noiseless data may be, the two-dimensional S-DUDE performs essentially as well as this genie, provided that the number of distinct regions satisfies m=o(n)m=o(n), where nn is the total size of the data. The resulting algorithm complexity is still linear in both nn and mm, as in the one-dimensional case. Our experimental results show that the two-dimensional S-DUDE can be effective when the characteristics of the underlying clean image vary across different regions in the data.Comment: 16 pages, submitted to IEEE Transactions on Information Theor

    Visual Concept Reasoning Networks

    Full text link
    A split-transform-merge strategy has been broadly used as an architectural constraint in convolutional neural networks for visual recognition tasks. It approximates sparsely connected networks by explicitly defining multiple branches to simultaneously learn representations with different visual concepts or properties. Dependencies or interactions between these representations are typically defined by dense and local operations, however, without any adaptiveness or high-level reasoning. In this work, we propose to exploit this strategy and combine it with our Visual Concept Reasoning Networks (VCRNet) to enable reasoning between high-level visual concepts. We associate each branch with a visual concept and derive a compact concept state by selecting a few local descriptors through an attention module. These concept states are then updated by graph-based interaction and used to adaptively modulate the local descriptors. We describe our proposed model by split-transform-attend-interact-modulate-merge stages, which are implemented by opting for a highly modularized architecture. Extensive experiments on visual recognition tasks such as image classification, semantic segmentation, object detection, scene recognition, and action recognition show that our proposed model, VCRNet, consistently improves the performance by increasing the number of parameters by less than 1%.Comment: Preprin

    Regularization and Kernelization of the Maximin Correlation Approach

    Full text link
    Robust classification becomes challenging when each class consists of multiple subclasses. Examples include multi-font optical character recognition and automated protein function prediction. In correlation-based nearest-neighbor classification, the maximin correlation approach (MCA) provides the worst-case optimal solution by minimizing the maximum misclassification risk through an iterative procedure. Despite the optimality, the original MCA has drawbacks that have limited its wide applicability in practice. That is, the MCA tends to be sensitive to outliers, cannot effectively handle nonlinearities in datasets, and suffers from having high computational complexity. To address these limitations, we propose an improved solution, named regularized maximin correlation approach (R-MCA). We first reformulate MCA as a quadratically constrained linear programming (QCLP) problem, incorporate regularization by introducing slack variables in the primal problem of the QCLP, and derive the corresponding Lagrangian dual. The dual formulation enables us to apply the kernel trick to R-MCA so that it can better handle nonlinearities. Our experimental results demonstrate that the regularization and kernelization make the proposed R-MCA more robust and accurate for various classification tasks than the original MCA. Furthermore, when the data size or dimensionality grows, R-MCA runs substantially faster by solving either the primal or dual (whichever has a smaller variable dimension) of the QCLP.Comment: Submitted to IEEE Acces

    Complementary Domain Adaptation and Generalization for Unsupervised Continual Domain Shift Learning

    Full text link
    Continual domain shift poses a significant challenge in real-world applications, particularly in situations where labeled data is not available for new domains. The challenge of acquiring knowledge in this problem setting is referred to as unsupervised continual domain shift learning. Existing methods for domain adaptation and generalization have limitations in addressing this issue, as they focus either on adapting to a specific domain or generalizing to unseen domains, but not both. In this paper, we propose Complementary Domain Adaptation and Generalization (CoDAG), a simple yet effective learning framework that combines domain adaptation and generalization in a complementary manner to achieve three major goals of unsupervised continual domain shift learning: adapting to a current domain, generalizing to unseen domains, and preventing forgetting of previously seen domains. Our approach is model-agnostic, meaning that it is compatible with any existing domain adaptation and generalization algorithms. We evaluate CoDAG on several benchmark datasets and demonstrate that our model outperforms state-of-the-art models in all datasets and evaluation metrics, highlighting its effectiveness and robustness in handling unsupervised continual domain shift learning

    Meta-Learning with Adaptive Weighted Loss for Imbalanced Cold-Start Recommendation

    Full text link
    Sequential recommenders have made great strides in capturing a user's preferences. Nevertheless, the cold-start recommendation remains a fundamental challenge as they typically involve limited user-item interactions for personalization. Recently, gradient-based meta-learning approaches have emerged in the sequential recommendation field due to their fast adaptation and easy-to-integrate abilities. The meta-learning algorithms formulate the cold-start recommendation as a few-shot learning problem, where each user is represented as a task to be adapted. While meta-learning algorithms generally assume that task-wise samples are evenly distributed over classes or values, user-item interactions in real-world applications do not conform to such a distribution (e.g., watching favorite videos multiple times, leaving only positive ratings without any negative ones). Consequently, imbalanced user feedback, which accounts for the majority of task training data, may dominate the user adaptation process and prevent meta-learning algorithms from learning meaningful meta-knowledge for personalized recommendations. To alleviate this limitation, we propose a novel sequential recommendation framework based on gradient-based meta-learning that captures the imbalanced rating distribution of each user and computes adaptive loss for user-specific learning. Our work is the first to tackle the impact of imbalanced ratings in cold-start sequential recommendation scenarios. Through extensive experiments conducted on real-world datasets, we demonstrate the effectiveness of our framework.Comment: Accepted by CIKM 202
    corecore